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ABSTRACT 


We  investigate  the  minimal  number  of  matrix-vector 
multiplications  to  approximately  solve  a  linear  system.  The 
minimal  number  of  multiplications  depends  on  the  properties  of  a 
class  of  problems  such  as  symmetry,  positive  definiteness,  and 
bound  on  condition  number.  For  different  classes  of  problems 
we  obtain  the  minimum  exactly  or  almost  exactly  and  establish 
which  algorithms  are  optimal,  that  is,  attain  the  minimum. 
Furthermore,  we  obtain  quantitative  results  on  how  the  lack 
of  certain  properties  increases  the  minimum.  . 
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1 .  INTRODUCTION 


Many  papers  deal  with  the  iterative  solution  of  a  large 
linear  system  Ax  =  b.  Typically  one  constructs  an  algorithm  <p 
which  generates  a  sequence  {x^}  converging  to  the  solution 
a  =  A_1b;  the  calculation  of  x^  requires  k  matrix-vector 
multiplications  and  x^  lies  in  a  subspace  spanned  by  b,  Ab, 

V 

. ..,  A  b.  The  algorithm  4>  is  often  chosen  to  guarantee  good 
convergence  properties  of  the  sequence  {x^}.  In  some  cases, 

$  is  defined  to  minimize  some  measure  of  the  error  in  a  restrictive 
class  of  algorithms.  For  instance,  let  this  class  be  defined 
as  the  class  of  "polynomial"  algorithms,  i.e.,  nt-x^  =  W^tAja, 
where  is  a  polynomial  of  degree  at  most  k  and  wk(0)  =  1* 

Then  choosing.  as  the  polynomial  minimizing  the  k-th  residual 

mr 

||  Ax^-bH  =||  Wj^  (A)  a  1 1  ,  we  obtain  the  minimal  residual  algorithm,  <f> 

If  A  is  symmetric,  positive  definite  and  a  =  1/||  A  1 1|  ,  b  =  ||  A|| 
are  known,  then  choosing  as  the  polynomial  minimizing 

ch 

max{  |w^(t)  |  :t  e  [a,b]},  we  obtain  the  Chebvshev  algorithm,  <t> 

It  seems  to  us  that  this  procedure  is  unnecessarily 
restrictive.  It  is  not  clear,  a  priori,  why  an  algorithm  has  to 
construct  of  the  form  a-x^  =  W^(A)a.  One  might  hope  that 

by  not  restricitng  the  class  of  algorithms  it  is  possible  to  obtain 
better  algorithms. 

In  this  paper  we  do  not  impose  any  restriction  on  the  class 
of  algorithms  ♦  which  construct  using  the  information 

(A, b)  =  [b,Ab, ...,  A*bJ.  Assuming  that  the  matrix  A  belongs 
to  a  given  class  of  nxn  nonsingular  matrices  F,  we  measure 
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the  goodness  of  an  algorithm  4  by  the  minimal  number  of  steps 
k  which  are  necessary  to  find  such  that  J|  Ax^-b||  <  e  for 

a  given  positive  e  c  (0,1].  (We  assume  that  |(  b||  =1.) 

We  define  two  types  of  optimality.  An  algorithm  <J>  is  said  to 
be  strongly  optimal  if  it  requires  the  minimal  number  of  steps 
for  every  A  from  the  class  F.  An  algorithm  $  is  said  to  be 
optimal  if  it  requires  the  minimal  number  of  steps  for  a  worst 
case  A  from  F.  (For  the  precise  definition  see  Section  2.) 

The  main  result  of  this  paper  is  that  the  minimal  residual 
algorithm  is  almost  strongly  optimal  provided  that  the  class  F 
is  orthogonally  invariant,  i.e.,  A  e  F  implies  QAQT  e  F  for 
any  orthogonal  Q.  We  show  that  the  assumption  of  orthogonal 
Invariance  is  essential.  That  is,  if  F  is  not  orthogonally 

invariant,  then  the  optimality  properties  of  the  $mr  algorithm 

*  « 

disappear. 

Usually  the  class  F  depends  on  a  parameter.  For  instance, 
we  consider  the  classes  F^ ,  F2  and  F^  of  nxn  matrices  with 
condition  number  bounded  by  a  given  M,  M2  1.  The  class  F^ 
consists  of  symmetric  and  positive  definite  matrices,  the  class  f2 
differs  from  Fj  by  the  lack  of  positive  definitess,  and  the  class 
F3  differs  from  F2  by  the  lack  of  symmetry.  Note  that  the 
minimal  residual  algorithm,  even  though  it  is  almost  strongly 
optimal  for  any  value  of  M,  does  not  use  M  for  the  construction 
of  the  sequence  (x^). 

We  also  prove  that  if  e  is  not  too  small,  the  Chebyshev 
algorithm  is  optimal  but  not  strongly  optimal  for  the  class  F^ 
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of  nxn  matrices  of  the  form  A  =  I-B  where  B  is  symmetric  and 
||  B 1 1  s  p  with  p  <  1.  In  contrast  to  the  previous  case,  the 
algorithm  depends  essentially  on  n.  We  also  consider  the 
class  Fg  which  differs  from  F^  by  the  lack  of  symmetry  of 
matrices  B.  We  establish  the  asymptotic  optimality  of  the 
successive  approximation  algorithm  for  this  class. 

For  all  these  five  classes  we  find  the  optimal  class  index 
which  is  defined  as  the  number  of  steps  required  by  an  optimal 
algorithm  to  find  xk  with  ||  Axk~b||  <  e.  We  are  able  to  conclude 
precisely  how  the  lack  of  positive  definiteness  and/or  symmetry 
increases  the  optimal  class  index. 

For  the  optimal  algorithms  considered  in  this  paper  we  can 
also  guarantee  that  the  construction  of  x^  requires  a  close  to 
minimal  number  of  arithmetic  operations  and  storage.  From  these 
properties  it  follows  that  they  are  almost  optimal  complexity 
algorithms,  i.e.  algorithms  which  minimize  the  total  cost  (measured 
by  the  number  of  arithmetic  operations)  of  finding  a  vector  x 
such  that  ||  Ax-b||  <  e  . 

In  the  first  six  sections  of  this  paper  we  consider  optimal 
algorithms  for  finding  a  vector  x  such  that  the  residual  vector 
Ax-b  has  norm  less  than  e.  In  Section  7  we  introduce  a  family 
of  approximation  criteria  for  choosing  a  vector  x.  We  generalize 
the  previous  optimality  results.  Among  our  results  we  show  that 
the  conjugate  gradient  algorithm  is  almost  strongly  optimal,  that 
if  e  is  not  too  small  then  the  Chebyshev  algorithm  is  optimal 
(but  not  strongly  optimal)  for  the  class  F4  with  an  arbitrary 
choice  of  the  approximation  criterion,  and  the  successive  approximation 
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algorithm  is  optimal  (but  not  strongly  optimal)  for  the  class 
F  for  a  certain  choice  of  the  approximation  criterion. 

We  stress  that  with  a  few  exceptions  the  results  of  this 
paper  are  not  asymptotic.  That  is,  we  know  the  exact  values  of 
the  optimal  class  indices  to  within  at  moat  unity  for  every 
c  from  the  interval  (0,1].  This  is  in  a  sharp  contrast  to  many 
results  in  algebraic  complexity  where  only  small  c  results 
can  be  established. 

The  problems  and  proof  techniques  of  this  paper  follow  the 
information  approach  of  the  monograph  by  Traub  and  Wofniakowski 
[80].  There  are  many  interesting  relations  between  the  optimality 
results  of  this  paper  and  the  general  results  of  the  monograph. 

For  the  reader ' s  convenience  we  do  not  use  the  general  terminology 
and  results  of  Traub  and  Wofniakowski  [80]. 

For  simplicity  we  consider  only  the  real  case,  although  the 
generalization  to  the  complex  case  is  straightforward. 

We  summarize  the  contents  of  the  paper.  Section  2  presents 
the  basic  concepts  of  strongly  optimal,  optimal,  and  almost 
strongly  optimal  algorithms.  The  minimal  residual  algorithm  is 
defined  in  Section  3. 

In  Section  4  we  establish  the  main  result  that  the  minimal 
residual  algorithm  is  almost  strongly  optimal  provided  the  class 
F  is  orthogonally  invariant.  In  Sections  5  and  6  we  find  the 
optimal  class  index  for  five  orthogonally  invariant  classes. 

Section  7  deals  with  generalized  criteria.  The  generalized 
minimal  algorithm  is  defined  and  proven  to  be  almost  strongly 
optimal.  Section  8  deals  with  the  complexity  of  finding  an 
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e-approximation.  In  Section  9  we  briefly  compare  the  Gauss 
elimination  algorithm  with  the  minimal  residual  algorithm. 

In  the  final  section  we  pose  some  open  problems  concerning  the 
optimality  properties  of  the  information  studied  in  this  paper. 
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2.  BASIC  CONCEPTS 

Let  F  be  a  subclass  of  the  class  of  n  x  n  nonsingular 
real  matrices.  Let  b  be  a  given  nxl  real  vector  such  that 
||  b  ||  *  / (b,  b)  =  1.  For  a  given  positive  c,  c  >  1,  we  seek  a 
real  vector  x  whose  residual  has  norm  less  than  e,  i.e., 

(2.1)  ||  A  X  -  b  II  <  e  ,  A  e  F  . 

We  call  x  an  e-approximation.  Since  b  is  normalized  to  unity, 

(2.1)  measures  the  relative  error  of  the  residual  vector.  In 
Section  7  we  discuss  the  problem  of  finding  x  with  relative 
error  less  than  e  in  a  variety  of  norms. 

To  find  an  e-approximation  we  need  some  information  about 
the  matrix  A.  We  define  an  information  operator  Nk  as 

(2.2)  Nk(A,b)  =  Cb,  Ab,  A2b, . . . ,  Akb  I 

for  k  =  1,  2, . . .  . 

Remark  2.1 

Let  zQ  =  b,  =  Azi  for  i  =  l,  2,...,  k-1.  Then  (2.2) 
can  be  rewritten  as 

(2.3)  Nk(A,b)  =  [z0,  AZq,  Az j ,  . .  . ,  Az^jJ. 

Thus  the  computation  of  Nk(A,b)  requires  k  matrix  -  vector 

multiplications.  If  A  is  sparse  Nk(A,b)  can  be  computed  in 

2 

time  proportional  to  kn  rather  than  kn  .  Usually  instead  of 
computing  N^tA.b)  we  compute 
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N^(A,b)  =  [  b.  AWj ,  Aw 2 .  Awk  ] 

where  w^  is  a  linear  combination  of  b,  Ab,  .  .  . ,  A*~*b  for 
i*l,  2,...,  k.  It  is  easy  to  show  that  all  the  results  of 
this  paper  also  hold  for  the  information  operator  N£.  ■ 

Remark  2.2 

Note  that  z^  in  (2.3)  is  a  function  of  previously  com¬ 
puted  vectors.  Thus,  Nk  is  an  example  of  an  adaptive  infor¬ 
mation  operator.  See  Section  9  where  we  discuss  adaptive 
Information  operators  in  general.  I 

We  define  an  algorithm  ♦  *  {  $k  }  as  a  sequence  of  map¬ 
pings  (F,  |£n)  TRn.  The  algorithm  $  generates  the 

sequence  xk  «  (Nk  (A, b) )  based  on  the  information  Nk(A,b). 

We  are  interested  in  the  smallest  value  of  k  for  which  xk 
satisfies  (2.1),  i.e.,  ||  Ax^  -  b  |J  <  e  .  In  general,  there 

exist  many  different  matrices  A  from  F  which  share  the  same 
Information  as  A,  i.e.,  Nk(A,b)  ■  Nk(A,b).  Thus  xk  =  $k(Nk(A,b)) 
**  ♦k^Mk^'b^)  mu8t  satisfy  (2.1)  for  A  and  A.  Define 

(2.4)  V(yk)  »  {A  :  A  «  F,  Nk(A,b)«yk}.  yR  =  Nk(A,b). 

Let 

(2.5)  k(  +  ,  A)  -  min  {  k  i  ||  Axk-b  ||  <  c  ,  V  Ac  V(yR)) 

be  the  matrix  index  of  *  .  (if  the  set  of  k  in  (2.5)  is  enf»ty. 

we  set  k($,A)  »■♦*•.)  Let 

(2.6)  k(6,  F)  -  sup  k($,  A) 

AtF 
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be  the  class  index  of  <t> . 

Thus,  the  matrix  index  of  $  denotes  the  minimal  number 
of  steps  required  to  find  an  c— approximation  using  the  algo¬ 
rithm  4*  for  all  matrices  A  from  F  which  share  the  sane 
information  as  A.  The  class  index  of  $  denotes  the  same  con¬ 
cept  for  the  hardest  problem. 

We  seek  algorithms  with  minimal  indices.  Let 

(2.7)  k(A)  =  min  k(0,A) 

♦ 

be  the  optimal  matrix  index  and  let 

(2.8)  k (F)  =  max  k(A)  (=  min  k(<f>,  F)  ) 

AeF  0 

be  the  optimal  class  index. 

Remark  2.3 

It  is  possible  that  k(A)  <<  k(F).  For  instance,  assume 
that  Ab  =  b.  Then,  of  course,  setting  x^  =  ^(y^)  =b  we  have 
Ax^  =  b  for  AcV(y^).  Thus  k(A)  =  l  for  every  e.  As  we  shall 
see  later  k(F)  can  be  equal  to  n. 

Thus  even  if  the  optimal  class  index  is  larg«  it  can  hap¬ 
pen,  due  to  favorable  properties  of  A  and  b,  that  the  optimal 
matrix  index  is  small.  The  algorithms  with  small  matrix  index 
are  therefore  very  useful  for  applications.  This  motivates  our 
interest  in  algorithms  with  small  matrix  index.  I 


We  are  ready  to  introduce  two  concepts  of  optimal  algorithms. 
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An  algorithm  $  is  called  strongly  optimal  iff 

(2.9)  k(4>  ,  A)  =  k(A)  ,  V  A  t  F 
and  is  called  optimal  iff 

(2.10)  k($  ,  F)  =»  k (F)  . 

We  can  sometimes  establish  that  the  matrix  or  class  index 
of  an  algorithm  is  slightly  larger  than  the  optimal  index.  It 
is  convenient  to  introduce  the  concepts  of  almost  strongly 
optimal  algorithm  and  almost  optimal  algorithm  as  follows.  An 
algorithm  4>  is  almost  strongly  optimal  iff 

(2.11)  k(  $,  A)  s  k(A)  +  c  ,  V  A  c  F, 

and  is  almost  optimal  iff 

(2.12)  k ( $  ,  F)  s  k (F)  +  c 
for  some  small  integer  c. 

Thus  an  almost  strongly  optimal  algorithm  requires  at  most 
c  more  steps  than  a  strongly  optimal  one.  Usually  k(A)  >>  c 
and  therefore  an  almost  strongly  optimal  algorithm  is  as  use¬ 
ful  in  practice  as  a  strongly  optimal  one. 

Remark  2.^ 

All  concepts  introduced  in  this  section  also  depend  on  the 
size  n,  the  information  the  vector  b  and  e.  To  simplify 
notation  and  terminology  we  do  not  make  this  explicit  but  the 
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reader  should  keep  in  mind  that  all  the  results  are  relative  to 
n,  Nk'  k  an<*  e* 

Sometimes  we  shall  need  to  show  the  dependence  on  b. 

Then  we  shall  write  k(4  ,  A,  b)  and  k($  ,  F,  b)  instead  of 
k{$  ,  A)  and  k(4  ,  F)  ,  respectively.  g 

Remark  2.5 

In  most  of  this  paper  we  focus  on  the  minimal  number  ot  stei 
k(F)  required  to  find  an  e-approximation.  Of  course,  we  also 
want  to  minimize  the  complexity  (the  cost)  of  finding  an 
e -approximation .  In  Section  8  we  derive  very  tight  bounds  on 
the  complexity  of  this  problem  and  we  show  that  the  complexity 
depends  primarily  on  k(F) .  g 

We  conclude  this  section  by  showing  that 
(2.13)  k(F)  s  n. 

Indeed,  assume  k=n.  Since  b,  Ab,  ...,  Anb  are  linearly  de¬ 
pendent  and  A  is  nonsingular,  there  exist  numbers  c^,  cn 

such  that 

b  =  c.  Ab  +. . .+  c  Anb  =  A (c.  b  +. . .  +  c  An-1b)  . 
in  in 

Setting  xn  =  4  (N  (A,b))=  c,  b  + . . .  +  c  An-1  b  we  find  that 
n  n  n  1  n 

II  A  xn  "  b  I!  ~  0*  This  implies  (2.13).  As  we  shall  see  later 
there  exist  many  interesting  classes  F  for  which  k(F)  is  much 
less  than  n  for  reasonable  values  of  c. 


2.6 


and  k{$,  F)  respectively.  From  (2.13)  we  conclude  that  these 
mimina  exist.  Thus,  k(A)  and  k(F)  are  well  defined.  ■ 
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3.  MINIMAL  RESIDUAL  ALGORITHM 

In  this  section  we  derive  the  minimal  residual  algorithm. 

Let 

N^  (A,  b)  =  [  Zq,  z^ .  zkJ  with  z.^  =  AXb. 

Knowing  the  vectors  z^  we  define  ,  .  .  . ,  c*  as  the  coef¬ 
ficients  which  minimize  the  norm  of  the  residual  in  the  space 
spanned  by  z^#z2,...,zk.  Thus 

(3.1)  ||  b  -  c^  z^  - .  .  .  -  ck  z ^  [ |  =  min  ] <  b  -  -  -  zk 

c . 

l 

The  minimal  residual  algorithm  briefly  the  mr 

algorithm,  is  defined  as 

(3.2)  xk  =  (Nk(A.b))  =  cjb+  ...  +c*  Ak_1  b  . 

Note  that  xk  =  A-1  (cj  z^ c£  zk)  .  The  vector 

♦  ■  ^  It  T 

c  =  [  c^  , . . . ,  ck  j  satisfies  the  linear  equations 

(3.3)  M  c*  =  g 

where  M  =  (  (z^  ,  z j  ) )  and  g  =  [  (z^, b)  ,  . . . ,  (zk»  b)  JT.  The  matrix 
M  is  nonsingular  iff  z^,  z2,...,  zk  are  linearly  independent. 

If  z^,  z2>...,  zk  are  linearly  dependent  then  b  belongs  to  the 
space  {z^, . . « ,  zk  ^ }  and  Axk  —  b. 

Let  P  be  a  polynomial  of  degree  at  most  k  such  that 
P(0)  =  0.  Let  Ilk  be  the  class  of  such  polynomials.  Then  (3.1) 
can  be  rewritten  as 

||  (  I  -  (A) )  b  ||  *  inf  ||  (I  -  P  (A) )  b  || 

K  P€nk 


(3.4) 
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where  P*(t)  =  c*  t  +. . . +  c*  tk  c  nk . 

If  A  is  symmetric  and  postiive  definite  then  the  mr 
algorithm  is  a  variant  of  the  conjugate  gradient  iteration. 
(See  for  instance  Stiefel  [58].}  In  this  case  it  is  known 
that  the  polynomials 


W*(t>  =  1  -  P*(t),  W* (0)  =  1, 


are  orthogonal. 


(3.5)  (W*,W*) 

for  k  /  i  where 


m 


l 

j=l 


Xj  W*(Xj) 


0 


m 

b  =  l  C 

j  =  l  J  J 


with  Cj  being  an  eigenvector  of  A  associated  with  the 
eigenvalue  Aj,  A£j  «=  Aj  II  ||  *  1,  0  <  A^^  <  x2  <  *  *  Xm  where 

m  s  n  and  c^  /  0  for  j  =  1,  2, . . . ,  m. 

Equation  (3.5)  implies  that 


(3.6)  (Ark  ,  rL)  =0 

where  r j  =  A  Xj  -  b  is  the  residual  vector. 

There  are  many  efficient  ways  to  compute  xk  for  symmetric 
positive  definite  matrices.  For  instance  xk  can  be  found  as 
follows.  Let  Xq  *  0.  For  i  =  0,  l,...,k-l  define 


-3.3- 


The  residual  vectors  r^  satisfies  a  similar  recurrence  re¬ 
lation  as  x±<  i  .e.  ,  ri  +  i  =  ri  +  q  i-1  (ri  “  ri-l^  ~  Ari  *  ' 

Note  that  xi+1  defined  by  (3.7)  is  a  linear  combination  of 
b,  Ab, ....  Aib  and  its  computation  requires  only  the  know¬ 
ledge  of  the  vectors  b,  Ab, . . .  ,  Ai  +  1  b.  See  Remark  2.1. 
Roundoff -error  analysis  of  a  class  of  conjugate  gradient 
algorithms  including  some  information  on  the  rar  algorithm 
can  be  found  in  Wofniakowski  f 80 1 . 


Remark  3.1 

We  assume  that  the  initial  approximation  xq  =  0.  This 
assumption  is  not  restrictive.  Indeed,  let  x^  be  a  nonzero 
approximation  and  let  c=  ||b-Ax0(|/  0.  Then  we  apply  the  mr 
algorithm  (3.7)  and  (3.8)  to  the  system  Ax =  b'  with 
b'  =  (b  -  AxQ)/c.  If  we  find  x’  such  that  1 1 Ax*  -b'  ||  <  e/c 
then  x  *  ex'  +  Xq  is  an  c-approximation  to  the  original  system 
since  ||  Ax  -  b  ||  <  c  .  ■ 


i 
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If  A  is  symmetric  then  the  matrix  M  in  (3.3)  is  Toeplitz 

and  a  recent  algorithm  due  to  Brent  [78],  and  Yun  and  Gustavson 

[79]  can  be  employed  to  find  the  vector  c*  in  time  proportion- 
2 

al  to  k log  k. 

We  end  this  section  by  a  remark  on  the  matrix  index  of  the 
mr  algorithm.  Let  AcV(yk).  (See  (2.4)).  Then  A*  b  =  A*  b 


for  i  =  1 ,  2,...,  k.  Due  to  (3.2)  we  have 

(3.9)  Axk-b  =  Axk-b,  VAe  V(yk) 
and  (2.5)  can  now  be  simplified  to 

(3.10)  k(*mr,A)  -  min  (k  :  ||  Axk  -  b  ||  <  e}. 

It  is  obvious  that  xR  =  A-1  b  which  implies  that 

(3.11)  k(*mr  ,  A)  5  n. 

If  A  is  symmetric  and  positive  definite  then  (3.5)  implies 


xm  =  A-1b  and  x^  /  A  *b  for  i  <  m.  Thus  k  ( 4> mr  ,  A)  i  m  and 
for  sufficiently  small  e,  k($  mr  ,  A)  ■  m. 
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4.  OPTIMALITY  OF  THE  MR  ALGORITHM 

In  this  section  we  study  optimality  properties  of  the  mr 
algorithm  defined  by  (3.2).  As  we  shall  see  the  mr  algorithm 
is  an  almost  strongly  optimal  algorithm  provided  the  class  F 
is  "orthogonally  invariant".  This  concept  is  defined  as  fol¬ 
lows.  Let  a)  be  a  real  nx  1  vector  such  that  ]|  oo  j|  =  1. 

Then 

(4.1)  Q  =  Q(w)  =  I  -  2  u)  wT 

2 

is  symmetric  and  orthogonal,  Q  =  I.  We  say  F  is  orthogonally 
invariant  iff 

(4.2)  A  c  F  =*  QAQcF 
for  every  Q  of  the  form  (4.1). 

Remark  4.1 

Let  us  recall  that  every  orthogonal  matrix  Q  can  be 
decomposed  into  a  product  Q  =  Q2  •  •  Qn  where  Qi  is  of  the 
form  (4.1).  Thus,  F  is  orthogonally  invariant  iff 

A «  F  =*■  QAQTcF 

for  every  orthogonal  Q.  ■ 

For  example,  the  class  of  symmetric  matrices,  the  class 
of  symmetric  positive  definite  matrices,  and  the  class  of 
matrices  with  condition  number  bounded  by  a  given  constant 
are  orthogaonally  invariant. 
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We  first  investigate  how  the  optimal  matrix  index  depends 
of  b.  (So  we  use  the  notation  k(A,b)  instead  of  k(A).) 

Lemma  4.1 

If  F  is  orthogonally  invariant  then 
(4.3)  k  (A,  b)  =  k  (Q  A  Q  ,  Ob)  . 

for  every  Q  of  the  form  (4.1).  | 

Proof 

Let  y^  =  Nk(A,b)  and  y£  =  Nk  (Q  A  Q  ,  Qb)  .  Then  y£  =  QyR  • 
Let  ♦=  {$.  }  be  an  algorithm.  Define  the  algorithm 
<P '  =*  as 

- 0 

Let  A'«  V(y£)  .  Then  A  =  Q  A*  Q  «  V(yk) 
and 

II  a'  $£(y,J)  -  Qb  II  -  II  Q  A'  o  ♦k(yk)  -  Q2  b  II  - 
e  II  A  tk(yk)  -  b  II 

This  implies  that  k(f',  QAQ,  Qb)  s  k($,  A,  b) .  Since  <p  is  an 
arbitrary  algorithm  this  yields  k(QAQ,  Qb)  s  k(A,b).  To  prove 
the  reverse  inequality  it  is  enough  to  interchange  the  role  of 
$  and  t '  •  ■ 

From  Lemma  4.1  easily  follows 
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Lemma  4 . 2 

If  F  is  orthogonally  invariant  then  the  optimal  class 
index  is  independent  of  b,  i.e. 

(4.4)  k (F  ,  b)  =  k (F)  ,  V  il  b  ||  =  1  .  ■ 

5 

Proof 

Let  b^  and  b^  be  two  vectors,  ||  b^  ||  =  ||  b^  ||  =  1.  Then 
there  exists  a  matrix  Q  of  the  form  (4.1)  such  that 

Q  b:  =  b2  . 

(The  existence  of  such  a  matrix  follows  from  the  Householder 
transformation).  Then  Lemma  4.1  guarantees  that 

k(A  ,  bx)  =  k(QAO,  Qb1)  =  k(OAQ,b2) 

which  easily  yields  that 

k(F  ,  bx)  =  k (F  ,  b2)  .  I 

We  now  prove  that  the  algorithm  $mr  has  properties 
analogous  tc  those  described  in  Lemmas  4.1  and  4.2. 

Lemma  4 ♦ 3 

If  F  is  orthogonally  invariant  then 

(i)  k(  $mr,  Q  A  Q  ,  Qb)  =  k(  $mr,  A,b) 
for  every  Q  of  the  form  (4.1), 

(ii)  k(  ♦mr,  F  ,  b)  =  k(  $mr,  F)  .  V  ||  b  ||  =  1, 

i.e., the  class  index  of  $mr  is  independent  of  b.  I 
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Proof 

(i)  Observe  that  the  coefficients  c*  of  (3.1)  are  inde¬ 
pendent  of  Q.  From  (3.2)  we  get 

xk  *  *mk(  VQAQ  '  Qb))  =  Q  $m£<  Nk(A,b))  =  Qxk. 

Since  ||  b  -  Axk  ||  =  1 1  Qb  -  Q  A Q  1 1  ,  we  have  to  perform 
exactly  the  same  number  of  steps  for  the  problem  (A,b)  and  the 
problem  (QAQ  ,  Qb)  to  find  an  e-approximation.  This  proves  (i)  . 

(ii)  Observe,  as  in  Lemma  4.2,  that 

k(  ♦mr  ,  A  ,  bx)  =  k($mr  ,  0  AQ  ,  b2) 
where  b2  =  Qb1,  ||  bx  ||  =  ||  b2  ||  =1.  This  yields 
k(#mr,F,b1)  =  k(^mr,  F  ,  b2) 

and  proves  (ii) .  g 

We  are  ready  to  prove  the  main  result  of  this  section 
which  exhibits  a  close  relation  between  the  matrix  index  of  <j»mr 
and  the  optimal  matrix  index. 

Theorem  4.1 

If  F  is  orthogonally  invariant  then  the  matrix  index  of 
the  mr  algorithm  differs  by  at  most  unity  from  the  optimal 
matrix  index,  i.e., 

(4.5)  k(A)  »  k^1”*,  A)  +  a,  Va«F, 


where 


a  «  0  or  a  =  -1. 
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Furthermore  either  a=0  or  a  =  -l 


can  hold. 


Proof 

Let  $  =  {<t>k  )  be  any  algorithm.  Let  k  =  k  ( $  ,  A)  <  + 
This  means  that 


(4.6)  ||  A  x£  -  b  ||  <  e  ,  A  e  V  (yk) 

where  x£  =  <}>  (N^  (A,  b)  )  .  Decompose 

xk  =  Z1  +  Z2 

where  is  a  linear  combination  of  b,  Ab, . . . ,  A' b  and  z^  is 

v 

orthogonal  to  b,  Ab,...,  A  b.  Define 


if  z2  /  o  , 
otherwise . 


Clearly  (o>,  A1  b)  =0  for  i  =  0,  1, 
A  =  Q  A  Q  with  Q  =  I 


. ,  k.  Let 

->  T 
Zu)  u) 


Then  A  c  F  and 

A1  b  =  Q  A1  Qb  =  Q  A1  b  =  A1  b  ,  i  =  1,  2 . k  . 

Thus,  A  eV(yk)  and 

||  A  -  b  ||  =  ||  A  XjJ  -  2  (w  ,  x^)  Aw  -  b  || 

Note  that  (w  ,  x^)  Au  ■  A»2  which  yields 

II  Ax,|  -  b  ||  =  ||  A*x  -  b  -  Az2  ||  . 


Observe  that 
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II  Azx  -  b  ||  s  i  ( ||Az1  -  b  -  Az2  ||  +  ||  Azx  -  b  +  A«2  || )  = 
=  \  <  II  A  x£  -  b  ||  +  !|  A  x£  -  b  I) )  <  e 

due  to  (4.6) . 

Recall  that  x^+^  =  ♦  (N^+^(A,b))  lies  in  the  same 

subspace  as  z ^  and 

II  Axk+1  '  b  II  *  II  Azl  ‘  b  II  <  e  • 

From  (3.10)  we  conclude  that 

k  (  $  mr\  A)  s  k  +  1  =  k(<f>  ,  A)  +1. 

Since  ♦  is  an  arbitrary  algorithm  we  have 

kU>mr,A)  s  k (A)  +  1  . 

On  the  other  hand  it  is  obvious  that  k(A)  s  k(4»mr  ,  a).  This 
* 

proves  (4.5) . 

We  now  show  that  either  value  of  a  can  occur.  Let  n  *  2 

and 

F  =  {A  :  A  ■  A*  >  0  and  cond(A)  s  M) 

be  the  class  of  2x2  symmetric  positive  definite  matrices  with 
condition  number  cond(A)  =  ||a||  || A-1 1|  bounded  by  a  given  number 
M  ,  M  > 1.  Note  that  F  is  orthogonally  invariant.  Let 
b  »  [1,0]T  and  e  s  Jl  /  ( 1  +  c * )  where 

c  =  2  /m"  /  (M  -  1)  . 

T 

(i)  Assume  first  that  Ab  =  [c  .  1  1  .  Then 

x.  »  ♦,mr(b  ,  Ab)  »  — S— r  b  , 

A  A  1+c^ 


and 
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Ax1  -  b  ||  =  1//  1  +  c  ^  c 


'/ 1  +  c2 


Thus,  k  (  0  ,  A)  >  1.  From  (2.13)  we  conclude 

k  ($  mr  ,  A)  =  2. 

We  show  that  k(A)  =  1.  Indeed,  knowing  b  and  Ab  we  conclude 
that  A  is  of  the  form 


c  1 


-1  XJ 


where  x  is  a  real  number  choosen  in  such  a  way  that  A  is 
positive  definite  and  cond(A)  s  M.  The  eigenvalues  of  A  arc 

Af  2  =  (c  +  x  t  Z[q  -  x)  2  +  4)/2. 

Thus  x  >  1/c  guarantees  positive  definiteness  of  A.  The  con¬ 
dition  number  of  A  is 


cond(A)  =  f  (x)  =  (c  +  x  +  Ac  -  x)  2  +  4)/(c  +  x  -  Ac  -  x )  ^  f  4  ) 


Note  that 


f  (x)  2  min  f  (x)  =  f  (c  +  2/c)  =  M  . 
x 

Since  cond(A)  is  at  most  M  we  conclude  that 
x  =  c  +  2/c  . 

This  means  that  V(y^)  consists  of  one  element  and  the  algorithm 
(b  ,  Ab)  =  A-1  b 

is  well  defined  and  has  zero  error.  This  proves  that 
k  (A)  =  1  =  k  (  $mr  ,  A)  -  1  . 

Hence,  (4.5)  holds  with  a  =  -l. 


If* 


.-asss..  .  is»  -  ;  1 
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(ii)  Assume  now  that  Ab  =  b.  Then,  of  course, 

xl  =  <b,  Ab)  =  b  and  k  (A)  =  k  (<J>  mr,  A)  -  1 

Hence  (4.5)  holds  with  a  =  0.  This  completes  the  proof  of  the 
theorem .  I 

From  Theorem  4.1  it  easily  follows  that  #mr  is  almost 
strongly  optimal  and  k($mr,  F)  differs  at  most  by  unity  from 
the  optimal  class  index  k(F)  . 

Corollary  4.1 

If  F  is  orthogonally  invariant  then 

(i)  the  minimal  residual  algorithm  is  almost  strongly 
optimal  (with  c  *  1  in  (2.11)). 

(ii)  k(F)  =  k($  mr.  F)  +  a  where  a  =  0  or  a=-l  . 

In  Section  6  we  show  that  either  value  of  a  in  (ii)  of 
Corollary  4.1  is  possible.  Since  k(F)  is  usually  large, 
Corollary  4.1  states  that  k(F)  is  essentially  equal  to 
k($  mr,  F) .  Thus,  it  is  enough  to  know  k  (  mr,  F)  to  conclude 
the  value  of  k(F).  In  Sections  5  and  6  we  find  k(  <J>  mr,  F) 
for  different  orthogonally  invariant  classes  F. 

We  end  this  section  by  a  remark  that  if  F  is  not  orthog¬ 
onally  invariant  then  none  of  the  optimality  properties  of  the 
algorithm  ♦mr  hold.  More  precisely  we  present  an  example  of  F 
for  which  the  mr  algorithm  can  be  arbitrarily  far  from  optimal. 
We  also  show  that  k(F)  depends  on  b. 
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Example  4.2 

Let  4»  be  the  class  of  n  x  n  symmetric  tridiagonal  matrices 
whose  diagonal  elements  are  equal  to  unity.  Thus  A  e  4>  implies 


F  =  {A  :  Aft;  cond  (A)  £  M} 

for  a  given  M,  M  >1.  The  class  F  is  not  orthogonally  invariant 
since  the  matrix  Q  A  Q  with  Q  of  the  form  (4.1)  is  not  neces¬ 
sarily  tridiagonal.  We  consider  two  cases  for  two  different 
vectors  b. 

(i)  Assume  first  that 
b2  ■  Cl.  1 . 1]T 

Then  knowing  z  =  Ab:  =  1T  we  get 

1  +  a^  »  z^ 

+l+a^=z^,  i  =  2,...,n-l. 

From  this  we  find  the  coefficients  a^, 

■i  *  zi  -  1 

ai  =  zi  “  1  ~  ai_i'  *  =  2'  —  ,n-l. 

Since  we  know  the  matrix  A,  the  algorithm 

*1  "  Vbl  '  Abl)  *  A_1  bl 

is  well  defined  and  ||  A  x^  -  b^  ||  *  0.  Thus 
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(4.7)  k(F  ,  bx)  =  1,  Ve  e  (  0  ,  1]. 

It  can  be  verified  that  for  sufficiently  small  e,  the  algo¬ 
rithm  <p  mr  has  to  use  the  information  N  (A,  b)  which  means 

n 

that 

k(  ♦  mr,  F,  bx)  =  n  . 

Hence  we  get  the  smallest  possible  value  of  k(F,  b^)  and  the 
largest  possible  value  of  k($mr,  F,  b^)  . 

(ii)  Assume  now  that 

b2  =  [1,  0 _ _  Of. 

T 

Then  Ab2  ■  Cl,  a^,  0 . 0]  supplies  only  the  information 

about  the  first  row  (and  the  first  column)  of  the  matrix  A. 
Similarly  knowing  Ab,  .  .  .  A*b,  we  know  the  first  i  rows  (and 
columns)  of  the  matrix  A.  Since  the  off-diagonal  elements  of 
the  jth  row,  J=i  +  1,  i  +  2,...,n-l,  are  unknown,  it  is  easy 
to  conclude  that  for  sufficiently  small  e  we  have 

(4.8)  k (F,  b2)  =  n  . 

Thus,  for  the  same  value  of  e,  k(F,  b)  can  be  equal  to  unity 
for  some  b  as  in  (4.7)  and  can  be  equal  to  n  for  a  different 
b  as  in  (4.8).  This  illustrates  that  if  F  is  not  orthogonally 
invariant,  k(F,  b)  depends  on  b.  ■ 


mm 
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5.  MATRICES  WITH  BOUNDED  CONDITION  NUMBER. 

In  this  section  we  deal  with  three  orthogonally  invariant 


classes 

of  matrices 

defined  as 

(5.1) 

F1  = 

(A 

T 

:  A  =  A  > 

0  ,  cond (A)  s  m} 

(5.2) 

F2  = 

(A 

> 

ii 

> 

-3 

cond (A)  s  M), 

(5.3) 

F3  = 

(A 

:  cond (A) 

s  M} 

where  cond(A)  =  I { A J [  ||A-*j|  is  the  condition  number  and  M  is 

a  given  number  such  that  Mil.  That  is,  F^,  is  the  class  of 
symmetric  positive  definite  matrices  with  condition  number 
bounded  by  M,  F2  differs  from  by  the  lack  of  positive  de¬ 

finiteness  and  F3  differs  from  F2  by  the  lack  of  symmetry. 

The  case  of  most  interest  is  when  M  is  large  since  such  prob¬ 
lems  arise  in  applications  and  are  difficult  to  solve. 

Our  main  interest  in  this  section  is  to  find  the  optimal 
class  indices  for  the  three  classes  and  to  see  how  the  lack  of 
positive  definiteness  and  the  lack  of  symmetry  increase  the  op¬ 
timal  class  index.  We  find  the  optimal  class  index  by  computing 
the  class  index  of  the  mr  algorithm  and  using  Corollary  4.1. 

We  are  ready  to  prove 


Theorem  5.1 
Let  Fa 


be  defined  by  (5.i)  for  i  =  l, 
k(F1)-a1  =k(*mr,  F1)-min(n,  Lin 


2 ,  3 .  Then 


1  + 


/  In 


i) . 


(5.4) 


e 
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(5.5)  k(F2)-a2  *  k ( <j>mr ,  F2)  =  min  (n ,  2  l  In  1  --/In  J  +  2)  . 

(5.6)  k(F3)  =  k(*mr  ,  F3)  =  n 

where  ai  =  0  or  =  -1  for  i  =  1,  2.  I 

Proof 

Let  (N^(A,b))  be  the  sequence  generated  by  the  mr 

algorithm.  Assume  first  that  M  =  1  and  A  e  F2 .  Then  A  =  cl  for 
some  nonzero  constant  c.  Since  Ab  =  cb,  x^  =  ^b  and  Ax^  -  b  =  0. 
Thus  k(F1)  =  k(F2)  =  k  (<j>mr  ,  F3)  =  k  (<J>mr  ,  F2)  =  1  which  agrees 
with  (5.4)  and  (5.5)  for  a^  =  a2  =  0.  Hence,  without  loss  of 
generality  we  assume  M  >  1  in  the  proof  of  (5.4)  and  (5.5). 

(i)  We  first  prove  (5.4).  It  is  well-known  that  for  sym¬ 
metric  positive  definite  matrices  the  mr  algorithm  converges  at 
least  as  fast  as  the  Chebyshev  algorithm,  i.e., 

(5.7)  ||  Axk  -  b  ||  s  2  pk/  (1  +  p2k) ,  k  <  n. 

where  p=*(/7f  -  l)/(/T?‘  +  1).  (To  show  (5.7)  it  is  enough  to 
define 

P(t)  =  1  -  Tk(f(t))/  Tk(f(0)) 

where  f  (t)  =  (2t  -  ||a"1||"1  -  |J A || )/  (  ||a||  -  ||a_1||_1)  and 
Tk  is  the  Chebyshev  polynomial  of  degree  k,  and  next  apply 
(3.4).) 

It  is  also  known  that  (5.7)  is  sharp,  i.e.  there  exists  a 


-5.3- 


matrix  A  from  F^  such  that  we  have  equality  in  (5.7).  For 
completeness  we  sketch  the  construction  of  such  a  matrix  A. 
Recall  that 


(5.8) 

for 

where  E " 
be  halved. 


k+1  u 

i£i  VV  Ts  (2i’  ■  0  •  J <s  sk 

=  cos  (_  71  ^  '  1  =  1,2 _ k  +  ]  , 

denotes  a  finite  sum  whose  first  and  last  terms 
Let 


are 


to 


(5.9) 

with 


A  . 

l 


c  [  M  +  1  +  (M  -  1)  Zi  J/  2  , 


i  =  1 ,  2,..,  k  +  1 


k+l#  -1 

c  =  2  i|1#  (  M+  1  +  (M  -  1)  zt  ; 


Then  Ai  ■  F  <  x2  <  •••  <  Ak+1 


CM,  Ak+1  ^  ^1  =  M"  Fufther  let 


±  ,-1/2  .-1/2 

/  2  l  9  2  9  *  * *  '  "k 


0  •  •  •  0  ^ 


-1/2  JL  .-1/2 
'  /y  Ak+i 


Note  that  ||  A  ||  ®  1.  Define  Q  ®  £2 . (ni  as  an 

orthogonal  matrix  such  that  Q  A  =  -b.  Finally  let 


(5.10)  A  =  Q  diag(A1#  ....  A^^,  ....  A^+1)  QT  . 

T 

Clearly  A  =  A  >0  and  cond(A)  *  M.  Thus  A  t  F^.  Note  that 
A£a  =  Aa  for  i  =  l,  2 , . . . ,  k  +1  and 

o  1/2  Ai  +  A2  £2+***+Ak  ^k  /2  Ak+1  '’k+l'  * 

Let 


T.(f (t)) 

Wj(t)  "  T(i(0)')  ’  f(t>  =  (2t  "  c(M  +  l))/(c(M  -  D). 


Then  Wj  (0)  ■  1 


Set  m  ■  k  +  1  , 
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i  .-1/2 


ci  ~  ~/T  Ai 


'  c2  =  -A2 


-1/2 


.  ck  ~  “Ak 


-1/2 


and  c 


=  ?=-  \~1/2 
k+1  /2  k+1 


in  (3.5).  Then  (Wj,Wg)  is  proportional  to 


k+1  k+1 

T  (f  (X . ) )  T  (f (X.))  ■ 
i=l  J  1  s  1  i=i 


l  "  Tj (zi)  Tg(zi)  =  0 


for  j  <  s  *  k,  due  to  (5.8).  Hence  {W^ )  is  orthogonal  and 
is  a  unique  solution  of  (3.4).  Thus 


(5.11)  |!  Axk  -  b  ||2  =  ||  Wk(A)b  ||2  =  £1"  X'1  T2(f  (Xi))/T2(f  (0)) 

=  (2pk/  (1  +  P2k))2 

since  T2(f(Xi))  =  T2(zA)  *  1  and  X"1  =  ||  X  ||2  *  i.  This  proves 

that  (5.7)  is  sharp. 

Let  k*  be  the  smallest  integer  such  that 
2  pk*  /  (1  +  p2k*>  <  e  . 


Then 

k*  *  L  In  l  +  lft-S7  /  In  J  +  1 . 

Let  k  *  k($mr  ,  Fj).  Note  that  if  k*  s  n  then  k  =  k*.  Indeed, 
k  <  k*  implies  k  <  n  and  we  can  find  a  matrix  A  from  such 

that 

||  Axk  -  b  ||  *  2pk/  (1  +  p2k)  2  e  . 

This  is  a  contradiction.  Thus  k  *  k*.  Since  xn  *  a  ,  k  is 
at  most  n.  This  and  Corollary  4.1  proves  (5.4). 

(ii)  We  now  prove  (5.5).  Let  p  ■  lk/2J  .  Since  A  is 
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2 

symmetric,  then  A  is  positive  definite.  Assume  that  ail 

2 

eigenvalues  of  A  lie  in  lc^  ,  c23.  Then  >  0  and  since 

2 

cond(A)  s  M  we  conclude  Cj/c^  s  M  .  Define 

T  (f (t2 ) ) 

P(t)  =  1  -  TP  (f(Q)-j-  ,  f(t)  =  (2t  -  c1-c2)/(c2  -  c1)  . 

P 

Note  that  P(0)  =  0  and  P  is  an  even  polynomial  of  degree 
2p  <.  k.  From  (3.4)  we  conclude 

(5.12)  ||  AxR  -  b  ||  i  J|  (I  -  P  (A) )  b  ||  <,  2rP/(l  +  p2p) 
where  now  p  =  (M  -  1)  /  (M  +  1)  .  Assuming  that 

(5.13)  2  lk/2  J  s  n  -  2 


we  construct  a  matrix  A  for  which  we  achieve  equality  in  (5.12). 
Similarily  to  (5.9)  define 

Ai  «  c  [  M2  +  1  +  (M2  -  1)  zjL  ]/2 ,  i  =  1,  2 - p  +  1 

with  c  *  2  [  M2  +  1  +  (M2  -  1)  z^  J_1  ,  and  zi  =  cos  (w  (p  +  1  -  i)  /p)  . 

(If  p  =  0  we  define  z^  =  -1.)  Then  for  p  z  1  we  have  A^^  =  c  <  A2  < 

...  <  Ap+1  =  cH2,  Ap+1  /  A^  =  M2.  Define  the  (p  +  1)  xl  vector 

d  as 


d 


L  rl/2 

v2  A1 


-1/2 


,-1/2  J,  ,-1/2 

p  '  /2  P+1 


Next  let  A  be  a  nxl  vector  defined  as 


*  =  7l  1  d  '  d,  0, . . ,  0  ]T  . 


Note  that 


||  A  ||—  1.  Further  let  Q  be  an  orthogonal  matrix  such 


-5.6- 


that  Q  X  =  -b.  Finally  let 
(5.14)  A  -  Q  diag  (/x^\  . 

'  ~^p+l'  *  *  * '  ~^p+l  ,Q  ' 

T  _ 

Clearly,  A  =  A  .  For  p  £  1,  cond  (A)  =  /X  X^  =  M  and  for 

p  *  0,  cond  (A)  =  1.  Thus  A  c  ?2  •  Note  that 

m.  &  (A2Jb,b)  =  PZ  “  X^"1  , 

J  i=l  1 

<A2J+1b,b)  =  0. 

It  is  straightforward  to  verify  that  the  solution  c*  of  (3.3)  is 
given  by 

C1  =  c3  '  *  *  *  =  c2f  k/2 1-1  =  0 
and  the  coefficients  c^  satisfy  the  system 


For  p»0  we  have  k*«l  and  *  0  which  proves  that  (5.12)  is 

sharp  in  this  case.  For  p  i  1,  we  get  the  same  system  as  for 

the  symmetric  positive  definite  case  with  k  replaced  by  p  and 
2 

M  replaced  by  M  .  From  this  we  conclude  that 

Axk  -  b  »  (A2)  b 
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where 
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Observe  that  is  orthogonal,  condfA^  *  1.  Thus  Ai  c  F3 . 
Further 

A*  b  =  A*  b  =  C0,...,0,  1,  0, ...,0]T 
A  ^  i 

for  1  <  n.  Let  $  ■  {$.  }  be  any  algorithm  and  let  £  denote 

K  n 

the  n-th  component  of  xk  =  ^k(Nk(A,b)),  k  <  n.  Then 

max(  ||  A2  xk  -  b  ||  ,  ||  A2  xk  -  b  || )  * 

max  ( |  Cn  -  1  | ,  |Cn  +  l  |)  ili  e. 

This  proves  that  k($,A)  z  n.  Since  k(F3)  is  independent  of  b 
due  to  Lemma  4.2,  we  conclude 

k{F3)  *  k(«mr,  F3)  «  n. 

This  proves'  (5.6)  and  completes  the  proof  of  the  theorem.  ■ 

Theorem  5.1  states  how  the  optimal  class  index  depends  on 
e  and  N.  For  small  value  of  e  and  large  values  of  M  we  can 
simplify  (5.4)  and  (5.5)  to 

(5.15)  k(Fx)  =  min  (n  ,  &  In  f  (1  +  o(l)))  +  ^ 

(5.16)  k(F2)  =  min  (n  ,  M  In  ^  (1  +  o  ( 1 ) ) )  +  a2 

Remark  5.1 

In  typical  applications  there  is  a  relation  between  n,  M 
and  e.  For  instance,  if  one  approximates  a  two  dimensional 
elliptic  differential  equation  then  the  corresponding  matrix  is 


jh. 
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tS. 18)  Nk(A,b)  =  [b,  Azr  Az2,...,  Azk] 

where  z^  =  z^{b,  Az^,  Az^,..,  Az^  is  an  arbitrary  function  of 

b  and  previously  computed  information.  Then  for  any  algorithm 

✓ 

4>  =  {<J>k)  there  exists  a  matrix  A  from  such  that 

(5.19)  ]|  A  ♦k(Mk(A,b))  -  b  ||  ^  1,  V  k  <  n. 

This  means  even  the  adaptive  information  (5.18)  is  too  weak  to 
find  an  e-approximation  using  less  than  n  steps.  Once  more 
(5.19)  holds  for  arbitrary  e  and  M.  See  section  9  for  a  general 
discussion  of  adaptive  information.  ■ 

We  summarize  this  discussion  in 

Corollary  5.1 

For  small  e,  large  M  and 

n  2  2  |.ln  l-  *  ^  /  In  +  2 


we  have 


k(Fx)  -  y  In  J  (1  +  o(l)), 


k(F2) 

k(Fx) 


2&  (1  +  o  (1) )  , 


k(F.) 

MfJ) 


^  In  1  (1  +o(D) . 


a 


ii  n  M'lMH  n  ii  rr—w  i  -  n  TVii»<  iurm  • 
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6.  MATRICES  OF  THE  FORM  A  -  I  -  B. 


In  this  section  we  consider  two  additional  orthogonally 
invariant  classes  of  matrices.  For  these  we  find  l:he  clas*' 
index  of  the  mr  algorithm  and  the  optimal  class  index.  We 
also  show  when  the  Chebyshev  algorithm  and  the  successive  ap¬ 
proximation  algorithm  are  optimal. 

Let 


(6.1) 

F4  =  {A  :  A  = 

I  -  B,  D 

=  b,  i!  B  !;  '  r  < 

1 1 

(6.2) 

F5  =  (A  :  A  = 

I  -  B, 

II  B  ||  s  p  < 

1)  . 

Thus  F4 

is  the  class  of 

symmetric 

positive  definite 

matrices  of 

the  form  I  -  B  where  the  norm  of  B  is  bounded  by  a  known  con¬ 
stant  p,  p  <  1.  The  class  F^  differs  from  by  the  non¬ 

symmetry  of  the  matrices  B.  Of  course,  F^  c  f^.  Note  that  for 
P  =  0,  ^4  =  F5  ®  (I).  To  exclude  this  trivial  case  we  assume  that 
P  >  0. 

Remark  6.1 

Observe  that 


(6.3)  Cond(A)  s  (1  +  p)/(l  -  p)  ,  V  A  e  F&  , 


and  (6.3)  is  sharp.  This  establishes 
class  F^  and  the  class  F^  with  M  = 
that  if  A  e  then  ||  A  ||  s  1  +  p  and 
These  bounds  do  not  hold  for  matrices 


a  relation  between  the 
(l  +  p)/(l-p).  Note,  however, 
||  A"1  ||  S  (1-p)-1. 
from  .  The  class  F^ 
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is  related  to  the  class  F^  with  the  same  M  =  (1  +  p)/(l  -  p)  . 
The  difference  between  and  F^  appears  if  M  goes  to 
unity,  i.e.,  p  goes  to  zero.  Then  F^  contains  only  the 
identity  matrix  whereas  F^  contains  matrices  of  the  form  cQ 
where  c  is  a  real  constant  and  Q  is  an  orthogonal  matrix.  | 
We  first  find  the  class  index  of  the  mr  algorithm  for 
the  two  classes  F^  and  F^. 


Theorem  6 . 1 

Let  F^  and  F^  be  given  by  (6.1)  and  (6.2).  Then 

J  2~  / . 2~ 

(6.4)  k($mr  ,  F4)  3  min  (n,L  In  1  *  ^  --e-  /In  *  *  *  ~  P~"  )  J  +  1)  , 

(6.5)  k(«mr  .  F5)  =  min  (n,  L  (1-  6)1+1) 

where  6  »  6(e,p)  satisfies 

(6.6)  0  s  6  S  i  ln(1  r  ~g2  •  ■ 

c  in  c 

Proof 

(i)  We  prove  (6.4).  Let  M  =  (l+p)/(l-p).  Define  the 
matrix  A  by  (5.10).  Let 

B  =  I  -  A 

where  c  is  given  by  (5.9).  Then  B  =  BT  and  ||b|(  =  p  .  Thus 
*  I  -  B  =  ~~  p  A  belongs  to  F^.  Then  xk  =  (Nk  (A^ ,  b) )  = 

“  r?p  ♦kr(Nk(A'b))  and  (5-11)  yieid® 

llVk  “bH  ‘Bl|A*JJr(Nk(A,b))  -  b ||  -2pJ/(l  +  pJk) 


where  p1  -  ( .-H  -  v  >  S 

Thus  the  class  index  of  the  rrr  algorithm  fc;  the  class  F.  is  the 

4 

same  as  for  the  class  with  M  -  ( l  +  p  i  /  :.  i  -  cl.  Since 

/FT  +  1  _  1  +  J\.  —  r- 

rW-  1  p 


(6.4)  follows  from  >5.4). 

(ii)  To  prove  (6.5)  observe  that  knowing  A 1  b  we  also  know 
B1b,  B  =  I  -  A,  for  i=0,  1 ......  k.  Thus  the  algorithm 

Xk  =  ^k  ^Nk  k'  *  =  b  +  Bb+...+  Bk 

is  well  defined  and 

li  Ax^  -  b  ||  =  j|  Bkb  ||  s  pk  . 

k- 1 

Since  lies  in  the  space  spanned  by  b,  Ab, . . . ,  A  b,  then 

(6.7)  ||  Axr  -  b ||  s  ||  Ax^  -  b  ||  s  pk 

where  xk  =  $kr  (NR  (A.  b)  )  .  We  now  find  a  lower  bound  or.  j|  Axk  -  b  jj 
for  k  <  n. 

Let  b=[l,0,...,0]1  and  B  =  pQ  where 


*  i  i  T 

B  b  =  p  [  0 ,  .  . .  ,  0 ,  1  ,  0 . 0  ]  i  <  n.  Since  A  =  I-E5, 

l+l 

(3.4)  easily  yields  that 

II  Ax,  -  b  ||  =  min  J|  P  < B )  b  || 

Pk ( 1 )  =  1  k 

where  pk  (t)  =  PQ  +  P1t  + .  . .  +  pktk  is  a  polynomial  of  degree  at 
most  k  and  p  +  p.  + .  . .  +  p.  =  1 .  Since  the  B1b  are  orthogonal, 
i  =  0,  1 ,  . .  . , k,  then 

II  PR(B)b  ||2  *  Pq  +  ?!  P2  +* '  -+  pk  p2k* 
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By  a  standard  technique  we  can  show  that 
(6.8)  ||Axk-b||  =  min  /(p2  + . .  . ♦  p2  p2k)  = 


P0+*-+Pk=1 


i  -  p 


2 (k+1 ) 


Compare  with  (6.7). 

Let  y  be  the  smallest  integer  such  that  \\  Ax^  -  b  ||  <  e. 
From  (6.7)  and  (6.8)  it  easily  follows  that 


k  s  min (n,  Lin  e/ln  p  J  +  1)  , 


2  2, 


*  *-in(n.lU}-J(l  P  >>1  *  1  )• 

This  proves  (6.5)  and  (6.6)  and  completes  the  proof  of  the 
theorem.  ■ 


To  find  the  optimal  class  index  for  the  class  we  use 
Theorem  6.1  and  the  properties  of  the  Chebyshev  algorithm.  It 
is  known  that  the  Chebyshev  algorithm  $  applied  to  the  sys¬ 
tem  x  =  Bx  +  b,  with  A  *  I  -  B(  ,  constructs  the  sequence 
{x^}  such  as 


(6.9) 


b  -  Axk  - 


Vi  fo1 

Wp> 


The  vector  x^  can  be  computed  from  the  recurrence  conditions 


x_i  =  0  ,  xQ  =  b  , 


(6.10) 


‘i+i  *  ci.i(B*i +  b  -  xi-i>  +  *i-r 


c0  -  2  '  cl+l 


1  -  fi  C, 
4  1 


-6.5- 


for  i  =  0,  1,...  .  Note  that  depends  on  b,  Bb .  B  b  or 

equivalently  on  Nk(A,b).  Thus 

xk  =  <j)kh(Nk(A,b)  )  . 

Remark  6 . 2 

Note  that  the  Chebyshev  algorithm  is  not  well  defined 
for  the  class  .  Indeed,  if  AeF^  then  the  norm  of 
B  =  I  -  A  can  be  larger  than  unity.  | 

Let 

(6.11)  q  ( e )  =  [in  ^  /  in  ^---p2  j  . 

We  find  the  class  index  of  the  Chebyshev  algorithm. 

Lemma  6 . 1 

k(*Ch  ,  F4)  =  q(e)  .  B 


Proof 

From  (6.  9)  we  have 

(6.12)  ||  b  -  Axk  ||  S  I  Tk+1(l/P)  I'1  -  2p5+1/  U  +  P2(k+1)> 

where  P2  -  p/(l  +  J\  ~  p^).  To  show  that  (6.12)  is  sharp  assume 
that  Bb  =  pb.  Then  (6.9)  implies 

b  -  Axk  =  (Tk+1  (1)/ Tk+1  (1/p)  }  b 

Since  Tk+^(1)  a  1  we  obtain  the  desired  result.  Note  that  the 
smallest  k  for  which  |  Tk+1  (l/o)  | <  t  is  equal  to  q(e). 


6.6- 


This  proves  Lemma  6.1. 

We  are  now  ready  to  derive  the  optimal  class  index. 


I 


Theorem  6 . 2 

k(F4)  =  min (n,q  (e) )  .  fl 

Proof 

Assume  first  that  q  =  q(e)  <  n.  Then  (6.4)  and  Corollary 
(4.1)  yield 


k($mr  ,  F4)  =  q  +  1  s  k(F4)  +  1. 

Since  k(F4)  s  k($ch,  F4),  Lemma  6.1  gives  k(F4)  =  q. 

If  q  2  n,  k($mr  ,  F4)  *  n  s  k(F4)  +  1  which  yields 

k (F4 )  2  n  -  1.  We  defer  the  proof  that  k(F4)  =  n  until 

Section  7,  Theorem  7.3.  ■ 

* 

We  obtain  the  optimality  properties  of  the  Chebyshev 
algorithm. 


Theorem  6 . 3 

(i)  The  Chebyshev  algorithm  is 


optimal 

if 

q  (e) 

s  n,  k($Ch,F4)  =  k(F4)  =q(e)  , 

not  optimal 

if 

q(c) 

>  n,  k($  ,F4)  -k(F4)  =q(e)  -n. 

(ii)  The  Chebyshev  algorithm  is  not  strongly  optimal. 

More  precisely  there  exists  a  matrix  At  such  that 

k(*ch,A)  -  k(A)  =k(*ch,F4)  -  1  =  q(e)  -  1. 
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Proof 

Conclusion  (i)  follows  directly  from  Lemma  6.1  and  Theorem  6.2. 

To  prove  (ii)  set  A=I-B  where  Bb=pb.  Then  k(A)=l  and 

C 

from  the  proof  of  Lemma  6.1  it  follows  that  k(<f>  ,  A)  =q(e)  = 

k(4>ch,F4).  I 

Note  that  the  assumption  q  (e)  s  n  implies  that  c  is  not 
too  small  relative  to  p  and  n.  For  small  e,  p  close  to 
unity,  and  n  so  large  that  q(e)  <  n,  we  have 

2 

(6.13)  k(Fj  =  k(<J>ch,F.)  =  k($mr,F.)  -  1  =  . .  (1  +  o(l)  )  • 

4  4  4  /2  (1  -  pi 

which  corresponds  to  k(F^)  with  M  =  (l  +  p)/(l-p). 

We  now  proceed  to  the  class  F5  .  First  of  all  observe 
that  we  do  not  have  the  exact  class  index  of  the  minimal  re¬ 
sidual  algorithm  since  an  unknown  6  appears  in  (6.5).  Note 
that  6  *  6(e,p)  goes  to  zero  with  e  for  fixed  p.  However, 
if  p  goes  to  unity  with  fixed  e,  then  i  e  (0,1)  and  (6.5)  is 
not  useful.  It  is  possible  to  improve  (6.5)  but  we  do  not 

pursue  this  here. 

For  sufficiently  small  e,  6  *  o(l)  and  (6.5)  can  be 
written  as 

(6.14)  k(<J>mr,F5)  =  min(n,  [  (1  +  o(l) )]  +1). 

From  Corollary  4.1  we  find 

k(F5)  *  min(n,  (l+o(l))J  +  1)  +  a3 


(6.15) 
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where  =  0  or  a3  =  -1-  If,  additionally,  p  is  close  to 
unity  and  n  is  so  large  that  the  minimum  in  (6.5)  is  attained 

for  the  second  argument,  we  have 

1 

(6.16)  k(F5)  =  (1  +  o(l))  . 


Note  that  for  the  corresponding  class  F3 
k(F3)  *  n.  Comparing  (6.16)  with  (6.13) 


(6.17) 


k(F5) 

Mf4) 


= 

V  1  -  p 


(1  +  0(1))  . 


we  always  have 
we  see  that 


This  shows  how  the  lack  of  symmetry  increase  the  optimal  class 
index. 

We  now  show  that  the  successive  approximation  algorithm 
$sa  is  asymptotically  optimal  for  the  class  F5.  This  algo¬ 
rithm  constructs  the  sequence  {x^}  as 


(6.18) 


b 


li+l 


Bxa  +  b  , 


Thus  x^  depends  on  b,  Bb 
A  =  I  -  B.  Obviously 


i  *  0,  1,...  . 

,  B*b  and  x^  =  $®a(Nk(A,b))  with 


||  b  -  Axk  ||  -  ||  Bk+1  b  ||  S  pk+1  . 


Note  that  this  estimate  is  sharp  since  for  Bb  =  pb  we  get 

S  A 

equality.  This  proves  that  the  class  index  of  $  is  the 

k+l 

smallest  k  for  which  p  <  c.  Thus 


(6.19) 


k (♦**. Fs)  =  [in  e  /  In  p J  • 
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Comparing  with  (6.16)  we  have 


k($*a,F5) 

k(F5) 


1  +  o(l) . 


for  small  e  and  large  n.  This  shows  that  the  successive  ap¬ 
proximation  algorithm  is  asymptotically  optimal.  As  was  the 
case  for  the  Chebyshev  algorithm,  the  algorithm  $sa  is  not 
strongly  optimal  since  for  A  =  I  -  B,  where  Bb  =  pb,  we  have 

k  (<J>sa,  A)  -  k(A)  =  k(4>sa,F5)  -  1  =lln  c  /  In  p  I-  1  . 


We  summarize  these  properties  in 


Theorem  6 . 4 

S  A 

The  successive  approximation  algorithm  $  is  asymp¬ 
totically  optimal  for  small  e  and  large  n, 

k(F5)  =  k{$sa,F5)  =  Lin  e/ In  pJ. 

The  algorithm  $*a  is  not  strongly  optimal.  I 


The  importance  of  our  optimality  result  concerning  F^  is 
that  in  numerical  practice  the  linear  system  Mx  =  g  is  often 
transformed  into  x  =  Bx  +  b.  Examples  of  such  transformation 
are  the  Richardson,  Jacobi,  Gauss-Seidel  and  SOR  algorithms. 
Our  result  states  that  asymptotically  the  transformed  system 
with  a  nonsymmetric  matrix  B  is  best  solved  by  the  successive 
approximation  algorithm. 


7.  GENERALIZED  CRITERIA 

In  this  section  we  introduce  a  family  of  approximation 
criteria  depending  on  a  parameter  p.  The  criterion  used  in 
Sections  2-6  corresponds  to  p  *  1.  The  values  of  p  of 
greatest  practical  importance  are  p  =  0,  1/2,  1. 

A  lower  bound  on  the  optimal  matrix  index  is  obtained 
for  any  orthogonally  invariant  class  and  for  any  value  of  p. 
For  some  values  of  p,  we  define  a  "generalized  minimal 
residual  algorithm"  for  which  this  lower  bound  is  almost 
achieved.  We  next  find  the  optimal  class  indices  for  the 
class  F4  with  arbitrary  p  and  for  the  class  F5  with  p  =  0. 
We  establish  the  optimality  of  the  Chebyshev  algorithm  for  F^ 
with  any  p  and  the  optimality  of  the  successive  approximation 
algorithm  for  F&  with  p  *  0. 

In  (2.1)  we  defined  an  e-approximation  as  a  vector  whose 
residual  has  norm  less  than  e.  Here  we  assume  that  the  e-ap¬ 
proximation  x  satisfies  the  inequality 

(7.1)  ll_APlx  -  alii  <  e 
I!  Ap  all 

where  a  =  A_1b  and  p  is  a  nonnegative  real.  Note  that  for 
p  =  1,  (7.1)  coincides  with  (2.1).  For  p  =  0,  (7.1)  means 

that  the  relative  error  of  x  is  less  than  e. 

If  p  is  not  an  integer  we  assume  that  A  is  symmetric 
and  positive  definite  to  guarantee  the  existence  of  Ap. 


We  generalize  the  concept  of  the  matrix  index  of  <(>  to 
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(7.2)  k($,A)  *  min{k  :  ||AP(xk  -  A-1b)  ||/||  Ap"1b||  <  c,  VAcV(yk)) 

where  4  *  { > ,  xR  -  ♦^(N^A.b))  and  V(yR)  is  given  by  (2.4). 

(If  the  aet  of  k  is  empty,  we  set  k(4,A)  =  +<*>.)  Then  all 

concepts  introduced  in  Section  2  may  be  generalized  in  an  obvious 

way  using  the  new  definition  of  the  matrix  index  of  $  . 

*  *  * 

For  given  A  and  m  define  the  coefficients  c0'  C1 . cm 

and  the  error  e(A,m)  as 

(7.3)  e(A.m)  =  ||  AP(a  -  cjb-.  .  .-c*Amb)  ||  =  min||Ap(a  -  cQb- .  .  .  -c/% 

ci 

Let 

m(A)  =  mintm  :  e  (A,m)/||AP  a||  <  e,  V  A  c  V(ym)} 

where  a  =  A_1b.  We  prove 

* 

Theorem  7.1 

If  F  is  orthogonally  invariant  then 

(7.4)  k(A)  *  m(A)  ,  V  A  c  F.  ■ 

Proof 

As  in  the  proof  of  theorem  4.1  let  4  =  (4>k}  be  any  algo¬ 
rithm  such  that  k  =  k(4,A)  <  +*>.  This  means 

(7.5)  II  Ap(xk  -  a)  II  /  ||AP  oil  <  t  ,  v  A  £  V(yk)  , 
where  xR  *  ♦R(Nk(A,b)).  Decompose 

xk  "  *1  +  z2 
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Jr 

where  *1  t  lin(b,  Ab,..,  A  b)  and  z2  is  orthogonal  to 
b,  Ab,..,  Akb.  Define  A^  =  Q  A  Q  with  Q  =  I  -  2wujT  and 
u  “  Sj  /  II^H  for  a  nonzero  z2  and  us  =  0  for  z2  =  0.  Then 
Ax  e  F  and  A^b  =  A1b,  i  =  1,  2,  .  . ,  k.  Thus  e  V  (yR)  . 

Observe  that 

(7.6)  =  Qa  and  ||  A^  ||  =  ||  Ap  a  ||  • 

Furthermore  , 

(7.7)  ||  aP(x,|  -  Sx)  ||  =  ||  Ap  Q  (zx  -  a  +  z2  +  2(w,a)w)  ||  = 

=  ||  A*5^  -  a)  +  Ap  z 2  +  2  (  (u,  a)  -  (u>,  zL  -  a  +  z2  +  2  (u>,  a)  w)  )  Ap 
=  ||  A*5^  -  a)  +  Apz2  -  2  (w,  z 2 )  APcu } |  =  ||  A13^  -  a)  -  Apz2  j|  . 
From  (7.5),  (7.6)  and  (7.7)  we  get 


l|Ap  a|| 


1 1  Ap  ( z x  -  a)|| 

II Ap  a |f 


1  (llAP(z1-S)-APz2|| 

2  1  II AP  S|| 


|  AP (z^  -  a) +Apz2 

iiap  sii 


s: 


1 

2 


|Ap(x'  -  a,] 
II  A?  5,  II 


IIap(Xi;-S)|| 

II Ap  a|| 


<  e 


Thus  k  2  m(A).  Since  $  is  an  arbitrary  algorithm  we  conclude 
k  (A)  2  m(A) .  Hence  (7.4)  is  proven. 

Theorem  7.1  provides  a  lower  bound  on  the  optimal  matrix 
Index.  The  next  part  of  this  section  is  devoted  to  finding 
algorithms  whose  class  indices  are  close  to  this  lower  bound. 
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As  we  shall  see,  this  can  only  be  done  for  certain  values  of  p. 

We  check  when  the  coefficients  c?  defined  by  (7.3)  can 

be  computed  in  terms  of  the  information  Nk(A,b).  From  (7.3) 

it  follows  that  c*  =  (ct,  c?,...,  c*  satisfies  the  linear 

u  1  m 

equations 

(7.8)  He*  =  h 

where  H  =  ( (A*+^b,  A^  +^b)  ),  i,J  =0,  l,...,m,  and 

h  =  t  ufb.tP-hi .  (Am+Pb,Ap-1bnT. 

We  consider  two  cases. 

(i)  A  =  aT.  Then  if  2p  is  integer,  2p  £  1  and 

m  =  k  -  T p  1 ,  the  vector  c*  depends  only  on  N^tA.b). 
T 

(ii)  A  A  .  Then  if  p  is  integer,  p  2:  1  and  m  *  k  -  p, 
•■the  vector  c*  depends  only  on  Nk(A,b). 

If  either  (i)  or  (ii)  holds  then  the  algorithm  $mr  =  (4>kr), 

(7.9)  xk  =  ♦kr(Nk(A,b))  »  cQ  b  ♦  ..+  Ck_rp-|A  P  b 

is  well  defined  and  is  called  the  generalized  minimal  residual 
algorithm. 

Note  that  for  p  =  1,  (7.9)  coincides  with  (3.2).  Assuming 
that  A  -  a^  >  0  we  can  set  p  *  1/2  and  the  algorithm  <f>mr 
is  known  as  the  classical  conjugate  gradient  algorithm.  See 
for  instance  Stiefel  [583.  In  this  case  one  of  the  possible 
ways  to  compute  xk  is  as  follows. 


Let  Xq  *  0.  For  i  «0,  1 


k  -  1  define 
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(7.10) 


where 


(7.11) 


Xi+1  “  xi  + 


—  tfi.ilXi-Xi.i)  -tj), 


rA  =  Axi  -  b, 


(Ari'  ri.) 

qi  =  (r1,ri)  ~  fi-l 

f  ,  =  0,  f.  ,  =  - :  q.  .  . 

-1  i-l  (ri_i , ri_i )  l-l 


Comoare  with  (3.7)  and  (3.8). 

For  p  =  0  and  A  =  A  the  first  component  (b,a)  of 
the  vector  h  is  in  general  unknown.  If,  however,  one  considers 
the  consistent  system  Mx  =  g  and  if  one  agrees  to  multiply  this 
system  by  MT  then  A  =  MTM,  b  =  MTg,  and  (b,u)  =  (g,g)  is 
computable.  Then  the  generalized  minimal  residual  algorithm  is  well 
defined  and  is  known  as  the  minimum  error  algorithm.  In  this  case 

*  K 

we  can  compute  as  follows.  Let  Xq  =  0.  For  i=0,  l,...,k  l 
define 


(7.12) 


where 


(7.13) 


Li+1 


xi  + 


—  tfi_1Ui  -xi_1)  -rt},  rA  =  Ax.  -b. 


(rl>rl) 

l|Mxi-g|| 


-  fi-r 


f 


-l 


0,  f 


i-l 


l|Mxi  -  g|j2 
1 1  MXi  _i  -  g  ||  2  1-1 


We  are  ready  to  show  that  the  generalized  minimal  residual 

t 

algorithm  is  almost  strongly  optimal. 
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Theorem  7 . 2 

Let  F  be  orthogonally  invariant.  Suppose  that  the  fol¬ 
lowing  two  conditions  hold: 

(i)  If  A  e  F  implies  A  =  AT,  VAeF,  then  2p  is  an  integer, 
otherwise  p  is  an  integer. 

(ii)  If  (b,a)  is  known  and  A  £  F  implies  A  =  AT,VA€F, 
then  p  *  0,  otherwise  p  >  0. 

Then  the  generalized  minimal  algorithm  is  almost  strongly  op¬ 
timal, 

(7.14)  k  (A)  +  a  =  k  ($mr.A)  =  m(A)  +  Tp  |,  VAcF, 

where  a  is  an  integer  from  [0,  Tp)).  ■ 

Proof 

Note  first  that  ( A )  and  (ii)  guarantee  that  the  algorithm 
is  well  defined.  From  (7.3)  and  (7.9)  we  have 

||  Ap(a  -  xk)  Ii  =  e  (A,  k- fp  1)  . 

Thus 

k($mr,A)  <*  m(A)  +  Tpl. 

Obviously  k($mr,A)  i  k(A)  which  due  to  (7.4)  yields 
Or  a  =  k  ( $mr,  A)  -  k  (A)  s  Tpl. 

This  proves  (7.14).  ® 

Observe  that  for  p=l,  the  conditions  (i)  and  (ii)  are 
always  satisfied  and  Theorem  7.2  coincides  with  Theorem  4.1. 


For  p = 1/2  ,  Theorem  7.2  states  that  the  classical  conjugate 
gradient  algorithm  is  almost  strongly  optimal  and  the  matrix 
index  of  the  classical  conjugate  gradient  differs  by  at  most 
unity  from  the  optimal  matrix  index. 

If  p  can  be  set  equal  to  zero,  then  (7.14)  states  that 

k (A)  =  k ($mr, A)  «  m (A) 

Thus,  the  minimum  error  algorithm  is  strongly  optimal. 

We  now  end  this  section  by  finding  the  optimal  class  index 
k(F)  for  the  class  F  =  F^  for  arbitrary  p,  and  for  the  class 
F -  Fg  with  p  -  0.  We  also  indicate  which  algorithms  are  optimal 
but  not  strongly  optimal.  Recall  that 


Thaoraa  7.3 

Let  F«F^  be  defined  by  (6.1).  Then  for  arbitrary  p, 
k(F4)  ■  min(n,  q (e) ) . 

Let  the  be  Chebyshev  algorithm  defined  by  (6.10).  Then 

k($ch,  f4)  *  q(c) . 

If  q(e)  s  n  then  the  Chebyshev  algorithm  is  optimal  but  not 
strongly  optimal. 

If  q(e)  >  n  then  the  Chebyshev  algorithm  is  not  optimal.  I 

Siaal 

Based  on  the  proofs  of  Theorems  5.1  and  6.1  we  can  show  that 


for  arbitrary  p, 
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m(F.)  max  m(A)  *  min(n,  q(e)). 

AcF4 

From  (6.9)  it  is  obvious  that 

|[AP(a  -  <J>ch(Nk(A,b))||  S  2  |f  AP  a  ( {  p5+1/(l  +  p2(k+1))«  VAcF4  . 
where  P2  =  P  / (1  +  -  p^) .  Thus 

k(*ch,  F4)  s  q(e). 

If  q(e)  in  then 

k(F4)  s  k(*ch,  F4)  S  q (e)  =  m(F4)  s  fc (F4 ) 

due  to  Theorem  7.1.  This  proves  the  optimality  of  the  Chebyshev 

algorithm  for  q(e)  s  n. 

cH 

We  show  that  k($  >  *  q(e).  As  in  the  proof  of  Lemma  6.1 

define  A=I-B  with  Bb=pb.  Observe  that 

a  -*ch(Nk(A,b))  =  <Tk+l(1)/Tk+l(1/p)  }  n 

Then  q(e)  =  k($ch,  A)  i  k(#ch,  F4  )  i  q(c)  which  yields  the  needed 
result. 

Since  k(A)  »  1,  we  have  k($ch,  A)-k(A)  =q(c)-l  which  proves 
that  the  Chebyshev  iteration  is  not  strongly  optimal.  Finally, 
note  that  k(F4)  is  at  most  n  which  completes  the  proof  of 
Theorem  7.3.  ■ 

Remark  7.1 

Observe  that  the  proof  of  Theorem  7.3  for  p*l  completes 


the  proof  of  Theorem  6.2. 


I 
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For  the  class  F  =  with  p  =  0  we  prove 
Theorem  7 . 4 

Let  F  =  Fg  be  defined  by  (6.2)  and  let  p  =  0.  Then 
k(F5)  =>  min(n,  lln  e  /  In  pj). 

Let  <f>  be  the  successive  approximation  algorithm  defined 
by  (6.18) .  Then 

k  (<psa,  F5)  =  Lln  e  /  In  p  J. 

If  I  In  c  /  In  p  J  s  n  then  the  successive  approximation 

algorithm  is  optimal  but  not  strongly  optimal. 

If  Lln  e/lnpj  >  n  then  the  successive  approximation 

algorithm  is  not  optimal.  I 

* 

Proof 

Let  xk  =  <f®a(Nk(A,b)  >  with  A«I-B.  Since 

l|o-xk||  .  ||Bk+1  atl  <  Pktl|MI, 

we  have 

(7.15)  k(F5)  s  k(*sa,  F5)  s  Lln  e/lnpj. 

We  show  that  (7.15)  is  sharp  for  Lln  c  /  In  p  J  s  n.  Let  k  <n 
and  A  =*  I  -  B,  A  =  I  -  B,  where 


where  0  end  §  are  (k+1)  x  (k+1)  matrices  and  I  is  the 
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(n-k-1)  x  (n-k-1)  identity  matrix.  Let 


i 


Q  = 


1  0 


-1 


1  0 


Observe  that  ||b||  =  ||  B 1 1  =  p  which  implies  that  A,  At  F,..  Further¬ 
more  B*b  =  Bib,  i  =  l,2 . k,  for  b=[l,0,...,0]T.  Thus 

Aib  =  A1b,  i  *  1,  2,  ....  k.  Let  a  =  A_1b  and  a=A-1b.  Then  it  is 


easy  to  show  that 


k+1 


(7.16)  a  *  —  £k-;y  a  . 


1  +  p 

Let  <)>  be  an  arbitrary  algorithm,  xk  *  0k(Nk(A,b))  and  let 
k  =  k  (  $ ,  A)  <  n.  Then 

.  II x*  -  all  II XK  -  «l 

(7.17)  .  jnax  - - -  ,  - * -  ]<  e  . 

Hall  ||a|| 


From  (7.16)  and  (7.17)  we  have 
-  k+1 

2*-^  ||a||  -  ||5-a||  *  IK-SH  ♦  II*,, -«||  <  c(||a||  +  ||a||) 


1  +  P 


2e 


1  +  P 


k+1 


k+1 


Thus  p  <  e  ,  which  implies 


k  =  k($  ,  A)  a  Lin  c/ln  pJ. 


Since  $  is  arbitrary, 


k(Fg)  2  k(A)  2  Lin  e  /  In  pJ 


-  ■'  I  <00111  0111  i  I  Ml  i  T-  |  -  i  Sfff-  - 
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From  (7.15)  we  get 

k(F5)  =  k(^sa.F5)  =  tin  e/ln  pJ . 
as  long  as  Lin  e/ln  pJ  £  n. 

This  proves  that  the  algorithm  $sa  is  optimal. 

Define  A*l-B  where  Bb=pb.  Then  k($sa,A)  =  Lin  e/ln  p 
This  and  (7.15)  proves  that 

k($sa,F5)  -  Lin  e/ln  PJ. 

Since  k(A)  =  1,  we  have 

k(4*a, A) -k(A)  -  k (♦**, F5) -1  =  Lin  e/ln  PJ-1 

which  proves  that  the  algorithm  is  not  strongly  optimal. 

Note  that  k(F,j)  is  at  most  n  which  completes  the  proof 
of  Theorem  7.4. 

For  p=l  we  established  only  the  asymptotic  behavior  of 
k(Fj).  For  p=0  we  have  the  exact  value  of  k(Fg). 
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8.  COMPLEXITY 


We  have  given  lower  and  upper  bounds  on  the  optimal  matrix 
and  class  indices  for  computing  an  e-approximation.  We  show 

how  these  results  can  be  employed  to  bound  complexity  (minimal 
cost)  of  finding  an  e-approximation. 

We  first  outline  our  model  of  computation.  For  sim¬ 
plicity,  let  the  cost  of  each  arithmetic  operation  be  unity. 

We  assume  that  the  cost  of  one  matrix-vector  multiplication 
Ax,  for  an  arbitrary  vector  x,  is  cn.  Note  that  c  =  c(A) 
depends  on  the  structure  of  A  and  for  sparse  matrices  c  is 
usually  proportional  to  unity  rather  than  to  n.  In  this 
paper  we  ‘discuss  algorithms  depending  on  the  information 
Nk(A,b)  a  Cb,  Ab,...,  A*b] .  As  noted  above  this  does  not 
necessarily  means  that  we  actually  compute  Aib,  i  =  l,  2,...,k. 
Rather  it  means  that  we  compute  Az^,  i  =  l,  2,...,k,  where 
z A  is  a  linear  combination  of  b,  Ab,...,  Ai-^b.  For  any 
choice  of  z^,  we  perform  k  matrix-vector  multiplications 
and  we  therefore  assume  that  the  cost  of  N^(A,b)  is  ken. 

Let  =  {<|>k}  be  an  algorithm.  To  find  x^  =  4>k  (N^  (A,  b)  )  , 
given  yk  =  Nk(A,b),  we  compute  Let  d($,k)  denote 

the  combinatory  complexity  of  ♦,  i.e.  the  cost  of  combining 
the  information  y^  to  produce  x^.  Note  that  yk  represents 
(k  +  l)n  scalar  data.  We  postulate  that  the  algorithm  <f> 
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uses  every  scalar  piece  of  data  at  least  once  and  therefore 

(8.1)  d(<J>,k)  2  kn,  V$. 

If  the  combinatory  complexity  of  4>  is  linear  in  the 
total  number  of  scalar  data  of  N^(A,b),  i.e.,  d(<j>,k)  s  c^kn 

for  some  "small"  constant  independent  of  A,  then  d(4>,k) 

is  close  to  minimal. 

Let  comp ($, A)  denote  the  cost  of  finding  an  e-approxi¬ 
mation,  i.e.  the  cost  of  computing  xk  =  (N^  (A,  b) )  such 

that  ||  Axk  -  b  ||  <  e  (or  J|  Ap(xk  -  a  )  ||/||  Ap_1b  ||  <  e  if 
criterion  (7.1)  is  used)  for  every  matrix  A  which  has  the 
same  information  as  A,  AcV(yk).  By  definition  we  have  to 
perform  k(4>,A)  matrix-vector  multiplications  to  find  an 
e-approximation.  Thus 

(8.2)  comp($,A)  «  c(A)n  k(<J>,A)  +  d  (<J> ,  k  (0,  A) ) 

Remark  8.1 

The  quantities  defined  in  this  section  also  depend  on  c, 
b,  Nk>  and  F.  We  remind  the  reader  that  for  simplicity  we  do 
not  exhibit  this  dependence  in  our  notation  or  terminology.  0 
Due  to  (8.1)  we  have 

(8.3)  comp($,A)  £  (c(A)  +  l)n  k($,A). 

We  seek  algorithms  with  minimal  complexity.  Define 

(8.4)  comp(A)  =  min  comp($,A)> 

♦ 
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Since  k(4»,A)  2  k(A),  (8.3)  yields 

(8.5)  comp(A)  2  (c(A)  +  l)n  k(A). 

Thus  equations  (8.4)  and  (8.5)  motivate  the  following  defini¬ 
tion. 

An  algorithm  <$>  is  an  optimal  complexity  algorithm  for 
A  iff 

(8.6)  comp($,A)  =  comp (A) 

and  4>  is  an  almost  optimal  complexity  algorithm  for  A  iff 
there  exist  two  small  integers  c^  and  c2  such  that 

(8.7)  comp($,A)  i  (c  (A)  +  c^nfkfA)  +  c2). 

Due  to  (8.5),  Cj  i  1  and  c2  2  0.  In  many  cases,  k(A) 
is  much  larger  than  c2  and  c(A)  is  much  larger  than  Cj^. 

This  yields 

(8.8)  comp(A)  =  comp($,A)  *  c(A)n  k(A). 

We  are  ready  to  prove 

Theorem  8.1 

An  almost  strongly  optimal  algorithm  with  linear  combi¬ 
natory  complexity  is  an  almost  optimal  complexity  algorithm 
for  every  A  from  f.  • 

Proof 

If  $  is  an  almost  strongly  optimal  algorithm  then 
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k($,A)  s  k(A)  +  c2,  V  A  «  F,  where  c2  is  a  small  integer  due 
to  (2.11).  The  algorithm  $  has  also  linear  combinatory  com¬ 
plexity,  d($,k)  s  Cj^kn.  Thus 

comp ($ ,  A)  s  (c (A)  +0^)0  (k (A)  +  c2) 

which  agrees  with  (8.7).  I 

We  proved  that  the  minimal  residual  algorithm  <f>mr  is 
almost  strongly  optimal  for  orthogonally  invariant  classes. 

See  Corollary  4.1.  We  now  consider  the  combinatory  complexity 
of  $mr  .  For  the  classes  F  =  or  F  =  F4<  the  algorithm 
0mr  is  defined  by  (3.7)  and  (3.8).  From  this  it  is  obvious 
that  its  combinatory  complexity  is  linear  with  c^  s  14  +  2/n. 
For  the  classes  F  =  F2  or  F  =  F^,  the  combinatory  complexity 
of  $mr  .  is  also  linear  due  to,  as  noted  in  Section  3,  a  fast 
algorithm  for  the  solution  of  any  linear  system  with  a  Toeplitz 
matrix. 

Observe  that  the  Chebyshev  algorithm  $mr  defined  by 

(6.10)  for  the  class  F  **  F^  also  has  linear  combinatory  com- 

ch 

plexity  with  c,  s  4  +  5/n.  The  algorithm  $  is  not  an 
almost  optimal  complexity  algorithm  for  every  A.  Theorem  6.3 
states  that  d®*1  is  optimal  whenever  q(e)  s  n  and  then 
k($ch,F^)  «  k(F^).  Thus,  if  A  is  such  that  k(A)  is  close  to 
k(Fj),  then  the  Chebyshev  algorithm  is  an  almost  optimal  com¬ 
plexity  algorithm  for  A. 

Similarily,  for  different  criteria  as  defined  by  (7.1)  we 
conclude  that  for  the  class  F  ■  F^  or  F  ■  F4  with  p  *=  1/2, 
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the  classical  conjugate  gradient  algorithm  is  an  almost  opti¬ 
mal  complexity  algorithm.  The  Chebyshev  algorithm  is  an 
almost  optimal  complexity  algorithm  for  matrices  A  such  that 
k(A)  is  close  to  k(F)  for  F  =  f4  with  arbitrary  p,  when¬ 
ever  q(e)  <  n. 

Finally,  for  the  class  F  =  F^  with  p  =  0,  observe  that 
the  successive  approximation  algorithm  4»sa  defined  by  (6.18) 
has  combinatory  complexity  equal  to  kn  which  due  to  (8.1)  is 

e  a 

minimal.  Theorem  7.4  states  that  $  is  optimal  whenever 
[In  c/ In  pj  s  n.  Thus,  for  matrices  such  that  k(A)  =  k(F5) 
the  algorithm  <p  is  an  optimal  complexity  algorithm. 

We  summarize  these  results  in 

Theorem  8 . 2 

(i)  The  minimal  residual  algorithm  is  an  almost  optimal 
complexity  algorithm  for  every  matrix  A  from  the 
classes  F F2  and  F4  with  p  =  1. 

(ii)  The  Chebyshev  algorithm  is  an  almost  optimal  algo¬ 
rithm  for  matrices  A  from  the  class  F^  for 
arbitrary  p  whenever  q(e)  s  n  and  k(A)  is  close 
to  k(F4). 

(iii)  The  classical  conjugate  gradient  algorithm  is  an  al¬ 
most  optimal  complexity  algorithm  for  every  matrix  A 
from  the  classes  F^  and  F^  with  p  =  1/2. 

(iv)  The  successive  approximation  algorithm  is  an  optimal 
complexity  algorithm  for  matrices  A  from  the  class 
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Fjj  with  p  =  0  whenever  Lin  e/lnpJs  n  and  k(A)  *  k(F5).  ■ 
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9.  COMPARISON  WITH  DIRECT  ALGORITHMS 

The  results  of  this  paper  enable  us  to  compare  direct 
algorithms  for  the  solution  of  linear  equations  with  optimal 
(or  nearly  optimal)  algorithms  using  the  information  Nk(A,b). 
This  comparison  can  be  done  for  different  classes  of  matrices 
and  with  mathematical  rigor.  Here,  however,  we  confine  our¬ 
selves  to  the  comparison  of  the  Gauss  elimination  algo¬ 
rithm  and  the  minimal  residual  algorithm  for  dense  and  sparse 
matrices  from  the  class  F^,  i.e.  for  the  class  of  symmetric 
and  positive  definite  matrices  with  condition  number  bounded 
by  M. 

We  first  consider  the  cost  of  the  arithmetic  operations 

and  then  briefly  discuss  storage  requirements.  Assume  A  is 

known.  We  discuss  the  case  of  dense  matrices.  Then 

3  2 

the  Gauss  elimination  algorithm  requires  n  /3  +  0(n  ) 
arithmetic  operations  to  find  the  exact  solution  of  Ax  =  b. 
(We  neglect  the  effect  of  roundoff  errors.)  For  simplicity 
we  assume  that  the  cost  of  the  Gauss  elimination  algorithm 
is  n^/3,  comp(G)  =  nV  3,  taking  the  cost  of  each  arithme¬ 
tic  operations  as  unity. 

Even  if  the  matrix  A  is  known,  it  can  be  more  effi¬ 
cient  to  use  "partial"  information  N^fAjb)  and  apply  the 
minimal  residual  algorithm.  Of  course,  in  this  case  instead 
of  the  exact  solution  we  want  to  find  an  e-approximation 
(in  the  sense  of  (2.1)).  Section  8  yields  that  the  cost  of 
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the  mr  algorithm  is 

comp(4»mr)  (c  (A)  +  c.)n(lln  £  -/In  r^-~ J  +  1)  . 

1  E  /PT-1 

Since  A  is  dense,  c(A)  is  proportional  to  n.  Then 
c(A)  +  <  2n  +  0(1).  For  simplicity  we  omit  the  lower 

order  term  and  conclude  that  if 

(9.1)  [in  ItA-  2  ,  ln  €ii J  <  JL  .  ! 

£  SW-1J  6 

then  comp(<J>mr)  <  comp(G).  Equation  (9.1)  exhibits  the  relation 
between  e,  M  and  n  which  guarantees  that  the  mr  algorithm  is 
more  efficient  than  the  Gauss  elimination  algorithm.  Note 
that  for  nil,  (9.1)  holds  provided  that  either  e  is  not 
too  small  or  that  M  is  not  too  large.  Thus  if  we  want  to 
find  an  e-approximation  with  moderate  e  or  if  the  condition 
number  of  the  system  Ax  =  b  is  moderate  then  the  minimal  re¬ 
sidual  algorithm  is  superior  to  the  Gauss  elimination  algo¬ 
rithm. 

We  now  discuss  the  case  of  sparse  matrices.  The  cost  of 
the  Gauss  elimination  algorithm  (in  fact  the  cost  of  any  direct 
algorithm)  for  sparse  matrices  depends  critically  on  the 
structure  of  A.  For  some  favorable  cases,  the  cost  is  pro¬ 
portional  to  n,  for  some  "bad"  cases,  it  can  be  still  propor- 

3 

tional  to  n  .  To  include  all  cases  assume  that  for  the  sparse 
case  the  cost  of  the  Gauss  elimination  algorithm  is 
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a 

comp(G)  =  c2  n 

for  some  6  e  tl,3J  and  a  positive  c^. 

For  the  mr  algorithm,  set  =  c(A)  +  c^.  Since  A  is 
sparse,  c3  is  of  order  unity.  If 

-  1 

✓M  -  1“  C3 

then  comp(<$>mr)  <  comp(G).  If  c2n^/c3  -  1  >  0,  then  (9.2) 
holds  provided  that  either  c  is  not  too  small  or  that  M  not 
too  large.  In  these  cases,  the  mr  algorithm  is  superior  to 
the  Gauss  elimination  algorithm. 

For  example,  set  c2  =  1,  B  =  2,  and  =  20.  Approxi¬ 
mating  the  logarithms ,  (9.2)  can  be  simplified  to 

(9.3)  f  In  |  <  ^  -  1  . 

Then  the  mr  algorithm  is  superior  to  the  Gauss  elimination 
algorithm  provided  (9.3)  holds. 

We  now  compare  storage  requirements.  As  above  we  dis¬ 
tinguish  between  dense  and  sparse  matrices.  For  dense  matrices, 
the  Gauss  elimination  algorithm  requires  storage  proportional 
to  n  .  The  mr  algorithm  uses  storage  proportional  to  n 
plus  storage  required  to  compute  Ax.  Therefore,  if  Ax  can 
be  computed  with  storage  less  than  n  ,  the  mr  algorithm  is 
superior.  For  example,  if  A  can  be  generated,  then  storage 
is  proportional  to  n. 


(9.2) 


In 


i  +  A 


/  In 
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For  sparse  matrices,  the  storage  required  by  the  Gauss 

elimination  algorithm  depends  critically  on  the  structure  of 

2 

A  and  may  vary  from  n  to  n  .  On  the  other  hand,  the 
storage  of  the  mr  algorithm  is  always  proportional  to  n. 
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10.  OPEN  PROBLEMS 

In  this  paper  we  studied  optimal  algorithms  for  the  so¬ 
lution  of  Ax  =  b  using  the  information  operator  Nk(A,b)  = 

[b,  Ab , ...» A  b].  We  have  focused  on  this  information 
operator  because  it  is  widely  used  in  practice  and  because 
it  is  susceptible  to  a  very  thorough  analysis.  It  would  of 
course  be  desirable  to  generalize  results  of  this  paper  to 
more  general  information  operators.  Until  this  is  accom¬ 
plished  we  won't  know  if  N^(A,b)  is  "optimal"  information. 

For  instance,  let 

(10.1)  Nk(A,b)  =  [b,  AZj, Az2, . . . ,  Az^J 

where  z^^  =  z^b,  Az^,...,  Az^  for  i  =  l,  2,...,k.  Tliat 

'  K 

is,  we  still  compute  the  matrix-vector  multiplications 
but  now  the  vector  z ^  is  an  arbitrary  function  of  the  pre¬ 
viously  computed  information.  For  information  (10.1)  we  can 
generalize  the  definition  of  the  optimal  matrix  and  class  in¬ 
dices  in  an  obvious  way.  We  ask  what  is  the  optimal  choice 
of  the  z^,  i.e.,  for  which  z ^  are  the  optimal  indices  mini¬ 

mized.  We  propose 

Con lecture  10.1 

If  F  is  orthogonally  invariant  then  the  optimal  matrix 
and  class  indices  are  minimized  for  the  vectors  z^  =  Ai-1b, 

i»l,  2,..,k.  That  is,  the  information  Nk(A,b)  *  [b,  Ab . Akb  | 

is  optimal  in  the  class  of  information  operators  of  the  form 

(10.1) .  ■ 
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We  now  consider  more  general  information  operators  than 

(10.1) .  Let 

(10.2)  Ns(A,b)  =  [b,  (A;b) ,  Lj (A;b, u^) , . . . ,  Lg (A;b, u^ , . . . , u  j) 

Where  ui  =  Li(A;b,u1 . ui_i)  '  i  =  1-2 _ _  -  1,  and  L..  is  a 

functional  which  depends  linearly  on  the  first  argument.  The 
can  depend  nonlinearly  on  b  and  on  the  previously  com¬ 
puted  information  u^  u2,..,  uil-  Note  that  (10.2)  is  the 
general  form  of  adaptive  linear  information  and  (10.1)  as  well 
as  (2.2)  are  special  examples  of  (10.2).  We  ask  what  is  the 
optimal  adaptive  linear  information,  i.e.  what  functionals 
minimize  the  optimal  matrix  and  class  indices.  In  would  also 
be  interesting  to  know  the  minimal  value  of  s  for  which  we  can 
find  the  exact  solution  of  a  linear  system.  From  Rabin  [72J  we 
can  conclude  that  s  s  (n  +  1) (n  +  2)/2  -  1  with  no  restriction 
on  the  class  F. 

We  also  want  to  pose  a  complexity  problem.  We  showed 

that  for  the  information  Nk(A,b)  =  [b,  Ab . Akb]  there 

exist  algorithms  which  are  optimal  (or  almost  optimal)  and 
which  have  linear  combinatory  complexity.  These  two  proper¬ 
ties  guarantee  finding  an  e-approximation  with  minimal  (or 
almost  minimal )  complexity. 

Let  N  (A,b)  be  an  optimal  adaptive  linear  information 
of  the  form  (10.2 ) .  Does  there  exist  an  almost  optimal  algo¬ 
rithm  using  Ng(A,b)  with  linear  combinatory  complexity?  or 
conversely,  is  it  true  that  if  an  information  operator  is 
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better  that  N^(A,b)  =  fb,  Ab, . . ,  Akb],  then  the  combinatory 

complexity  of  an  almost  optimal  algorithm  cannot  be  linear? 

We  can  establish  one  result  for  N  (A.b).  The  functionals 

s 

in  (10.2)  must  depend  on  b.  Otherwise  the  information 
Ns(A,b)  does  not  supply  enough  knowledge  to  find  an  e-approxi¬ 
mation.  To  show  this  assume  that 

(10.3)  Ng (A, b)  =  tb,  Lj (A) ,  L2 (AjU^) # . . . *  Lg (A; Uj , . . . , ug )  . 

where  u^  =  (A;u^,  . . .  ,u^J  is  independent  of  b.  As  in  (2.8), 

let  k(F)  be  the  minimal  value  of  s  such  that  there  exists  an 

algorithm  which  uses  N  (A.b)  and  finds  an  e-approximation  in 

s 

the  sense  of  (7.1). 

For  simplicity  we  establish  the  desired  result  only  for 

the  class  F^.  Without  loss  of  generality  we  assume  that 

e  s  p  .  (Otherwise  the  algorithm  ♦„(N_(A,b))  *  b  yields  an 

s  s 

e -approximation. ) 


Theorem  10.1 

Let  e  s  p,  F  =  F^  and  p  be  arbitrary.  There  exists  a 
vector  b  such  that 


k(f4) 


n  (n  +  1) 
2 


■ 


Proof 

Let  A  =  I  +  B  where 


ui-i> 


0  , 


(10.4) 


(B|  Ux | « • 


i  *  1#  2  f  »  •  / 


s. 
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and  =  Li  ^ '  ul '  '  ‘  '  ui-l ^  ‘  Note  that  (10.4)  corresponds  to  s 
homogenous  linear  equations  in  coefficients  of  B.  Since  B 
is  an  nxn  symmetric  matrix,  we  have  n(n+l)/2  unknowns. 

If  s  <  n(n  +  l)/2  then  there  exists  a  nonzero  matrix  B  sat¬ 
isfying  (10.4).  We  can  normalize  B  such  thatjjB  ||  =  p.  De¬ 
fine  a  vector  b  such  that  Bb  =  cb  with  c  =  tp.  Let 
A  =  I  -  B.  Then  A  e  F4  and  Ng(A,b)  =  Ng(A,b).  Let  <J>  =  { 0 ^ } 
be  an  algorithm  and  ^  (Ny,  (A,  b)  )  .  Let 


where  a  =  A  *b  and  a  =  A  ^b.  Then  a  =  y~  b, 

||  Ap  a  |  j  =  (1  +  c)p_1,  a  =  y—  b  and  ||  A  a  ||  =  (l-c)p  * .  Let 


xk  =  c^b  +  x 


where  Cj  =  (xk,b)  and  x  is  orthogonal  to  b.  Then 
((I  i  B)px,b)  =  0  and 

||  (I  ±  B)p(xk  -  b)  ||2  2  j  c1(l  ±  c)p  -  (1  ±  c ) p—  1 1  . 

Thus 

a  2  max  (|c^(l+p)-l|,  |  (1  -  p)  -  1 1 )  2  P  -  e  • 

Since  $  is  arbitrary,  this  proves  that  it  is  impossible  to 
find  an  e-approximation  for  s  <  n(n  +  l)/2.  This  completes  the 
proof .  I 

Mote  that  for  the  class  P^,  we  can  recover  the  matrix 
A  ■  knowing  a  suitable  chosen  Ng(A,b)  with  s=n(n  +  l)/2. 
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